Show warning message when last user input get pruned #4816

Jazzcort · 2025-03-25T18:21:00Z

Description

If the user's last input is pruned due to context overflow, a warning message will be displayed in the chat section, alerting them that some details may have been lost. As a result, the response they receive might be incomplete or inaccurate due to the truncated input.
Granite-Code/granite-code#22

Screenshots

Testing instructions

Set the model’s context length to a small value (e.g., 512) and ask a question that exceeds contextLength - maxTokens tokenwise. A warning message will appear at the bottom of the chat section, indicating that some input may have been truncated. Deleting previous messages will remove the warning.

netlify · 2025-03-25T18:21:18Z

✅ Deploy Preview for continuedev canceled.

Name	Link
🔨 Latest commit	`f97736e`
🔍 Latest deploy log	https://app.netlify.com/sites/continuedev/deploys/67f557b47a4b61000849ebbd

RomneyDa

@Jazzcort this could be a great addition, could you explore solutions that avoid injecting a new warning message type into the chat messages, along with a more subtle warning UI?

Jazzcort · 2025-03-27T21:24:31Z

@RomneyDa I'll try to find another way to send the warning message back to the webview.

RomneyDa · 2025-03-28T22:13:17Z

@Jazzcort I'd be interested in having this sort of "stream warning" idea as well so fell free to bounce approach/ideas here before spending too much time on them! I think there could be several different approaches where the warnings aren't persisted to chat history, maybe passing them with streams but with a "warning:" field that is captured in redux streamUpdate and temporarily added to UI or something. Let me know thoughts

Jazzcort · 2025-03-28T23:59:12Z

I've already implemented this second approach in the following branch: Jazzcort/warn-when-truncate-last-msg-v2.

Instead of sending the warning through the stream, I used the messenger system to deliver the warning message. With this approach, I make the pruning behavior occur before calling streamChat so I can reach the messenger reference. The advantage of this approach is that users receive the warning message before the streamed response rather than after it has finished.

Regarding the "warning:" field, are you suggesting adding it to AssistantChatMessage? I think that could work as well! I'm open to either approach—whichever aligns better with the project's design. We can also discuss the UI implementation afterward.

owtaylor · 2025-03-31T10:19:24Z

Another idea would be to extend @Jazzcort's last approach and have two separate calls from webview => core.

llm/pruneChat => {prunedMessages, warning?: string}
llm/streamChat

or something like that. (llm.streamChat could still prune itself when it's not being invoked from the chatview.)

This would avoid having to worry about the interaction between warnings and streaming. It could also be potentially useful for some other things:

prefix-caching sensitive pruning (project writeup: Context stability - taking advantage of prefix caching Granite-Code/granite-code#96)
having some way to reveal the pruned messages in the UI. I'm not at all sure that this is a good idea - the user can look at logs - but I do feel that it can be deceptive to have a rich chat history with no indication that only a tiny fraction of it might be actually be sent to the model.

Jazzcort · 2025-03-31T13:43:35Z

I agree with @owtaylor's suggestion. Calling llm/pruneChat before llm/streamChat not only helps manage context length but also provides control over whether llm/streamChat is called at all. Users might not focus too much on the last response when they see the warning message.

Additionally, leveraging Context stability - taking advantage of prefix caching is a great strategy. It can enhance the user experience by reducing response time when the context limit is reached. @RomneyDa If everything sounds great, I'll start working on it.

RomneyDa · 2025-04-01T06:54:18Z

Planning to look into this tomorrow midday!

RomneyDa · 2025-04-02T02:37:37Z

@Jazzcort @owtaylor would agree that two separate calls is a good approach, especially having the warning up front would be great and it won't effect all the other uses of streamChat in core, etc. llm/pruneChat could work, I'd be interested in approaches to this that don't do the pruning twice, but also use the same streamChat function. Perhaps some kind of alreadyPruned boolean passed to llm.streamChat. Counting tokens can be a bit expensive but not bad.

@sestinj tagging since touches core streaming

Jazzcort · 2025-04-03T15:11:51Z

I'm planing to implement llm/compileChat which call compileChatHistory to prune the chat messages we pass and return the compiled chat messages and a boolean that indicates whether we should warn users or not. I'll also add a boolean parameter to llm/streamChat so we won't do the pruning twice. How do you guys think? @owtaylor @RomneyDa @sestinj

owtaylor

Looks pretty good to me. Just two suggestions.

owtaylor · 2025-04-07T18:59:17Z

core/core.ts

+
+      if (completionOptions.model === "o1") {
+        completionOptions.stream = false;
+      }


Hmm, duplicating this type of logic into two places is going to make things difficult to maintain in the future. I would suggest making model._compileChatMessages() public [and not _-prefixed] and using that. You could make it call this._modifyCompletionOptions() itself as well, since calling that multiple times shouldn't hurt.

owtaylor · 2025-04-07T19:14:51Z

gui/src/redux/thunks/streamNormalInput.ts

+    if (lastMessageTruncated) {
+      dispatch(
+        setWarningMessage(
+          "The context has reached its limit. This may lead to less accurate answers.",


"the context has reached its limit" sounds like the entire chat is too long. Maybe:

The provided context items are too large. They have been truncated to fit within the model's context length.

It's a bit verbose but:

Say more specifically what happened

Contain the phrase "context length" so that if the user searches they can find out what the context length is.

owtaylor · 2025-04-07T20:13:34Z

core/core.ts

+      const model = await this.configHandler.llmFromTitle(modelName);
+
+      options.log = undefined;
+      const completionOptions: CompletionOptions = mergeJson(


Hmm, thinking about it, it's probably better to have a public llm.compileChatMessages() that takes the LLMFullCompletionOptions, and have that and streamChat() call _compileChatMessages() - sorry for not suggesting that on the previous round.

owtaylor

Code looks good to me. @RomneyDa - do you still think the warning needs to be more subtle? From my perspective, when the last message is truncated, the results are unlikely to be good.

Patrick-Erichsen · 2025-04-08T15:59:45Z

Not to make this a design by committee, but my only feedback is that I think the warning should be yellow rather than red. Red feels a bit too scary for the severity of this warning imo.

Great contribution though, thanks @Jazzcort ! 👌

First, I integrated llm/compileChat to precompile chat messages before invoking llm/streamChat. The llm/compileChat function returns both the compiled messages and a boolean flag indicating whether the most recent user input has been pruned. This allows us to trigger a warning to notify users when pruning occurs at the last message. The messageOptions is introduced to streamChat as an optional parameter. If we pass messageOptions and the precompiled attribute is set to true, streamChat will skip the process of compiling chat messages, ensuring that the messages won't go through the pruning process twice.

Jazzcort · 2025-04-08T17:13:28Z

@Patrick-Erichsen I’ve updated the warning message color to yellow—thanks for the feedback!
@RomneyDa Let me know if you'd like to tweak the UI further. We can either make those changes now or go ahead and merge this PR and revisit the design later.

Jazzcort requested a review from a team as a code owner March 25, 2025 18:21

Jazzcort requested review from RomneyDa and removed request for a team March 25, 2025 18:21

RomneyDa requested changes Mar 27, 2025

View reviewed changes

Jazzcort force-pushed the Jazzcort/warn-when-truncate-last-msg branch 4 times, most recently from 50920c6 to 495ae05 Compare April 7, 2025 18:59

owtaylor reviewed Apr 7, 2025

View reviewed changes

Jazzcort force-pushed the Jazzcort/warn-when-truncate-last-msg branch from 495ae05 to 173c73a Compare April 7, 2025 19:42

owtaylor reviewed Apr 7, 2025

View reviewed changes

Jazzcort force-pushed the Jazzcort/warn-when-truncate-last-msg branch from 173c73a to 064164a Compare April 7, 2025 20:43

owtaylor reviewed Apr 7, 2025

View reviewed changes

Jazzcort force-pushed the Jazzcort/warn-when-truncate-last-msg branch from 064164a to f97736e Compare April 8, 2025 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show warning message when last user input get pruned #4816

Show warning message when last user input get pruned #4816

Jazzcort commented Mar 25, 2025 •

edited

Loading

netlify bot commented Mar 25, 2025 •

edited

Loading

RomneyDa left a comment

Jazzcort commented Mar 27, 2025

RomneyDa commented Mar 28, 2025 •

edited

Loading

Jazzcort commented Mar 28, 2025

owtaylor commented Mar 31, 2025

Jazzcort commented Mar 31, 2025

RomneyDa commented Apr 1, 2025

RomneyDa commented Apr 2, 2025 •

edited

Loading

Jazzcort commented Apr 3, 2025 •

edited

Loading

owtaylor left a comment

owtaylor Apr 7, 2025

owtaylor Apr 7, 2025

owtaylor Apr 7, 2025

owtaylor left a comment

Patrick-Erichsen commented Apr 8, 2025 •

edited

Loading

Jazzcort commented Apr 8, 2025

Show warning message when last user input get pruned #4816

Are you sure you want to change the base?

Show warning message when last user input get pruned #4816

Conversation

Jazzcort commented Mar 25, 2025 • edited Loading

Description

Screenshots

Testing instructions

netlify bot commented Mar 25, 2025 • edited Loading

✅ Deploy Preview for continuedev canceled.

RomneyDa left a comment

Choose a reason for hiding this comment

Jazzcort commented Mar 27, 2025

RomneyDa commented Mar 28, 2025 • edited Loading

Jazzcort commented Mar 28, 2025

owtaylor commented Mar 31, 2025

Jazzcort commented Mar 31, 2025

RomneyDa commented Apr 1, 2025

RomneyDa commented Apr 2, 2025 • edited Loading

Jazzcort commented Apr 3, 2025 • edited Loading

owtaylor left a comment

Choose a reason for hiding this comment

owtaylor Apr 7, 2025

Choose a reason for hiding this comment

owtaylor Apr 7, 2025

Choose a reason for hiding this comment

owtaylor Apr 7, 2025

Choose a reason for hiding this comment

owtaylor left a comment

Choose a reason for hiding this comment

Patrick-Erichsen commented Apr 8, 2025 • edited Loading

Jazzcort commented Apr 8, 2025

Jazzcort commented Mar 25, 2025 •

edited

Loading

netlify bot commented Mar 25, 2025 •

edited

Loading

RomneyDa commented Mar 28, 2025 •

edited

Loading

RomneyDa commented Apr 2, 2025 •

edited

Loading

Jazzcort commented Apr 3, 2025 •

edited

Loading

Patrick-Erichsen commented Apr 8, 2025 •

edited

Loading